A Landmark-based Model of Speech Perception: History and Recent Developments

نویسندگان

  • Janet Slifka
  • Kenneth N. Stevens
  • Sharon Manuel
  • Stefanie Shattuck-Hufnagel
چکیده

This paper traces some of the history of the development of a model for speech perception in which words are assumed to be represented as sequences of bundles of binary distinctive features. In the model, probability estimates for feature values are derived from measurements of acoustic attributes in the vicinity of acoustic “landmarks.” Landmarks are detected based on amplitude changes in various energy bands, and landmarks or pairs of landmarks provide evidence for the existence of feature bundles. This paper discusses data and thought processes that prompted four significant changes in the formulation of the model over the past decade: rule-generated changes in segments versus modifications of cues for features, landmark detection, principles of cue selection, and the role of analysis-by-synthesis in verifying word hypotheses. The model currently allows for refinement of a cohort of words alongside the landmark-to-feature estimation process. In this view, initial estimates of some features based on local analysis of the signal are used to propose a cohort. The words in the cohort are used to inform landmark/feature estimation based on a more extended context. This process iterates between improving feature estimation and refining the cohort to arrive at the model output. INTRODUCTION This paper provides a brief review of the history of the development of a model of lexical access and discusses data and thought processes that prompted changes in the original formulation of the model. The model incorporates methods of signal processing which are derived from knowledge of relations between articulation, acoustics, and perception (the Lexical Access Project). The Lexical Access Project at MIT began in the early 1990’s with the assumption that the lexicon is represented in terms of sequences of “segments,” or bundles of binary distinctive features (Jakobson, et al., 1952), where a change in the value of a distinctive feature potentially leads to a new word. These features are grouped into two classes: articulator-free features and articulator-bound features (Table 1). Articulator-free features are not associated with a particular articulator (Halle,1992) but are associated with general characteristics of constrictions within the vocal tract and the corresponding acoustic patterns formed by these constrictions. These features classify segments into broad classes which might be described as vowels and some general classes of consonants. Articulator-bound features specify which articulators are active during production, and how these articulators are shaped or positioned. Slifka et al. Lexical Access Project From Sound to Sense: June 11 – June 13, 2004 at MIT C-86 One of the fundamental assumptions of the model is that there are locations or landmarks in the acoustic signal which are particularly rich in information related to the configurations of the larynx and of the various components of the articulatory tract. This information can be expressed in terms of acoustic cues to the distinctive features. Landmarks occur at times of a) consonant closures, b) consonant releases, c) glide minima and d) vowel maxima. Landmarks are thought to provide initial information about the manner of articulation and the order and number of intended segments, and create a road-map for subsequent acoustic analysis (Stevens, 2002). Because these locations are likely to be acoustic discontinuities or distinct minima/maxima in energy, they should be robustly identifiable to both a listener and an algorithm. More importantly, they bear a privileged relationship to the underlying segments of the utterance, in the sense that each landmark locates a region in the signal where the cues for the various features tend to be concentrated. Other acoustic discontinuities, such as the onset of voicing or closure/opening of the velopharyngeal port, also provide useful information, but do not have the same privileged relationship to underlying contrastive segments (Stevens,2002). Much of the research on this project in the 1990’s and early 2000’s was concerned with the development of methods for discerning (1) landmarks and (2) cues for the identification of articulator-bound features by examining the signal in the vicinity of these landmarks. The goal was to automate this process, and to derive from these cues the sequences of bundles of distinctive features that can then be used to access words and word sequences from the lexicon. Modules for several different types of landmarks and several features were developed, including consonant landmarks, vowel landmarks, landmarks for glides, place features for stop consonants, the feature [nasal], the voicing feature, and place features for vowels. Some of these modules involved automatic procedures, and some required hand extraction of cues. The accuracy with which these various landmarks and features could be identified in these modules ranged from about 60 percent to 95 percent for running speech. Table 1. List of distinctive features for English as grouped by articulator-free and articulator-bound classes. Articulator-bound features Articulator-free features Vowel and glide Consonant Vowel Consonant Continuant Sonorant Strident High Low Back Adv. Tongue root Spread glottis Lips Tongue blade Tongue body Round Anterior Lateral Rhotic Nasal

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Reliability of Interaural Time Difference-Based Localization Training in Elderly Individuals with Speech-in-Noise Perception Disorder

Background: Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception a...

متن کامل

International Conference on Islamic Awakening

Recent developments in the middle-east have been differently explained by different figures and dignitaries from different countries of the world. Calling these developments as "Islamic awakening", "Human awakening", "Arab Spring" and "Purple Revolution" indicates there are different theories in this regard definitely leading to different measures in the political arena. Among supreme leader's ...

متن کامل

Envelope-based inter-aural time difference localization training to improve speech-in-noise perception in the elderly

Background: Many elderly individuals complain of difficulty in understanding speech in noise despite having normal hearing thresholds. According to previous studies, auditory training leads to improvement in speech-in-noise perception, but these studies did not consider the etiology, so their results cannot be generalized. The present study aimed at investigating the effectiveness of envelope-b...

متن کامل

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004